An example of how optimizing for short-term rewards can weaken
An example of how optimizing for short-term rewards can weaken
https://gyazo.com/597878edc889a3c2489d01be73177041
@tsukammo: This is what happens with an evaluation function based on direct rewards alone, so a common " lifehacks" are optimizing the evaluation function with "curiosity" or "prepare a reward by chopping in small steps". Yeah, I know all that. I just don't.
---
This page is auto-translated from /nishio/短期的報酬に最適化すると弱くなる例 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.